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Who am I? 


a Maintainer of Linux man-pages project since 2004 

» «1050 pages, mainly for system calls & C library functions 
® https://www.kernel.org/doc/man-pages/ 

9 (I wrote a lot of those pages...) 
a Author of a book on the Linux programming interface 
9 http://man7.org/tlpi/ 

a Trainer/writer/engineer 

http://man7.org/training/ 

a Email: mtk@man7.org 
Twitter: @mkerrisk 
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Time is short 


a Normally, I would spend several hours on this topic 

a Many details left out, but I hope to give an idea of big 
picture 

a We’ll go fast 
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(Traditional) superuser and set-UID-root programs 


a Traditional UNIX privilege model divides users into two 
groups: 

a Normal users, subject to privilege checking based on UIDs 
and GIDs 

a Superuser (UID 0) bypasses many of those checks 

a Traditional mechanism for giving privilege to unprivileged 
users is set-UID-root program 

# chown root prog 

# chmod u+s prog 


a When executed, process assumes UID of file owner 

® => process gains privileges of superuser 

a Powerful, but dangerous 
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The traditional privilege model is a problem 


a Coarse granularity of traditional privilege model is a problem: 
» E.g., say we want to give a program the power to change 
system time 

9 Must also give it power to do everything else root can do 

9 =>• No limit on possible damage if program is 
compromised 

a Capabilities are an attempt to solve this problem 
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Background: capabilities 


a Capabilities: divide power of superuser into small pieces 
9 38 capabilities as at Linux 5.4 (see capabilities(7)) 
o Examples: 

9 CAP_DAC_OVERRIDE: bypass all file permission checks 

9 CAP_SYS_ADMIN: do (too) many different sysadmin 
operations 

9 CAP_SYS_TIME: change system time 

a Instead of set-UID- root programs, have programs with 
one/a few attached capabilities 

9 Attached using setcap(8) (needs CAP_SETFCAP capability!) 

9 When program is executed =>■ process gets those capabilities 
9 Program is weaker than set-UID-root program 
9 =>• less dangerous if compromised 
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Background: capabilities 


a Summary: 

a Processes can have capabilities (subset of power of root ) 

a Files can have attached capabilities, which are given to 
process that executes program 

a Privileged binaries/processes using capabilities are less 
dangerous if compromised 
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Namespaces 


a A namespace (NS) “wraps” some global system resource to 
provide resource isolation 

9 Linux supports multiple NS types 
» Seven currently, and counting... 
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Each NS isolates some kind of resource(s) 


a Mount NS: isolate mount point list 
9 (CLONE_NEWNS; 2.4.19, 2002) 
a UTS NS: isolate system identifiers (e.g., hostname) 

9 (CL0NE_NEWUTS; 2.6.19, 2006) 
a IPC NS: isolate System V IPC and POSIX MQ objects 
» (CL0NE_NEWIPC; 2.6.19, 2006) 
a PID NS: isolate PID number space 
» (CL0NE_NEWPID; 2.6.24, 2008) 

a Network NS: isolate NW resources (firewall & routing rules, 
socket port numbers, /proc/net, /sys/class/net, ...) 
a (CL0NE_NEWNET; «2.6.29, 2009) 
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Each NS isolates some kind of resource(s) 


a User NS: isolate user ID and group ID number spaces 
» (CLONE_NEWUSER; 3 . 8 , 2013 ) 

® Cgroup NS: virtualize (isolate) certain cgroup pathnames 
o (CL0NE_NEWCGR0UP; 4.6, 2016) 
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Namespaces 


a For each NS type: 

9 Multiple instances of NS may exist on a system 

9 At system boot, there is one instance of each NS type-the 

initial namespace 

» A process resides in one NS instance (of each of NS types) 

a To processes inside NS instance, it appears that only they 
can see/modify corresponding global resource 
9 (They are unaware of other instances of resource) 
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UTS namespaces (CLONE NEWUTS) 


a UTS NSs are simplest NS, and so provide an easy example 
9 Isolate two system identifiers returned by uname(2) 
o nodename\ system hostname (set by sethostname(2)) 
o domainname : NIS domain name (set by setdomainname(2)) 

9 Container configuration scripts might tailor their actions 
based on these IDs 

» E.g., nodename could be used with DHCP, to obtain IP 
address for container 

9 “UTS” comes from struct utsname argument of uname(2) 

<» Structure name derives from “UNIX Timesharing System" 
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UTS namespaces (CLONE NEWUTS) 


a Running system may have multiple UTS NS instances 

a Processes within single instance access (get/set) same 
nodename and domainname 

a Each NS instance has its own nodename and domainname 
o Changes to nodename and domainname in one NS instance 
are invisible to other instances 
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UTS namespace instances 


Initial UTS NS 



Each UTS NS contains a set of processes (the circles) which 
see/modify same hostname (and domain name, not shown) 
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Some “magic” symlinks 


a Each process has some symlink files in /proc/PID/ns 


/proc/PID/ns/cgroup 

# 

Cgroup NS instance 

/proc/PID/ns/ipc 

# 

IPC NS instance 

/proc/PID/ns/mnt 

# 

Mount NS instance 

/proc/PID/ns/net 

# 

Network NS instance 

/proc/PID/ns/pid 

# 

PID NS instance 

/proc/PID/ns/user 

# 

User NS instance 

/proc/PID/ns/uts 

# 

UTS NS instance 


a One symlink for each of the NS types 
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Some “magic” symlinks 


a Target of symlink tells us which NS instance process is in: 


$ readlink /proc/$$/ns/uts 
uts: [4026531838] 


» Content has form: ns-type : [magic-inode-#'] 
a Various uses for the /proc/PID/ns symlinks, including: 

» If processes show same symlink target, they are in 
same NS 
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APIs and commands 


a Programs can use various system calls to work with NSs: 
a clone(2)\ create new (child) process in new NS(s) 
o unshare(2)\ create new NS(s) and move caller into it/them 

o setns(2)\ move calling process to another (existing) NS 
instance 

a There are analogous shell commands: 

o unshare(l): create new NS(s) and execute a command in 
the NS(s) 

o nsenter(l): enter existing NS(s) and execute a command 
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The unshare(l) and nsenter(l) commands 


unshare(l) and nsenter(l) have flags for specifying each NS type: 


unshare 

[ options] 

[command [arg 

-c 

Create 

new 

cgroup NS 

- i 

Create 

new 

IPC NS 

-m 

Create 

new 

mount NS 

-n 

Create 

new 

network NS 

-P 

Create 

new 

PID NS 

-u 

Create 

new 

UTS NS 

-U 

Create 

new 

user NS 


nsenter 

[options 

] [command [arguments]] 

-t PID 

PID of 

process whose NSs should be entered 

-C 

Enter 

cgroup NS of target process 

- i 

Enter 

IPC NS of target process 

-m 

Enter 

mount NS of target process 

-n 

Enter 

network NS of target process 

-p 

Enter 

PID NS of target process 

-u 

Enter 

UTS NS of target process 

-u 

Enter 

user NS of target process 

-a 

Enter 

all NSs of target process 
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Privilege requirements for creating namespaces 


® Creating user NS instances requires no privileges 

9 Creating instances of other (nonuser) NS types requires 
privilege 

a CAP SYS ADMIN 
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Demo 


a Two terminal windows (shl, sh2 ) in initial UTS NS 

shl$ hostname # Show hostname in initial UTS NS 

antero 


a In sh2, create new UTS NS, and change hostname 


sh2$ hostname 

# Show hostname in initial UTS NS 

antero 


$ PS1=’sh2# ’ 

sudo unshare -u bash 

sh2# hostname 

bizarro # Change hostname 

sh2# hostname 

# Verify change 

bizarro 



» Used sudo because we need privilege (CAP_SYS_ADMIN) to 
create a UTS NS 
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Demo 


® In shl, verify that hostname is unchanged: 


shl$ hostname 
ant ero 


9 Compare /proc/PID/ns/uts symlinks in two shells 


shl$ readlink /proc/$$/ns/uts 
uts : [402653 1838] 


sh2# readlink /proc/$$/ns/uts 
uts : [4026532855] 


The two shells are in different UTS NSs 
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Demo 


a From shl, use nsenter(l) to create a new shell that is in 
same NS as sh2\ 



9 Comparing the symlink values, we can see that this shell 
(sh3#) is in the second (sh2#) UTS NS 
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What do user namespaces do? 


a Allow per-namespace mappings of UIDs and GIDs 

a l.e., process's UIDs and GIDs inside NS may be different 
from IDs outside NS 

a Interesting use case: process may have nonzero UID outside 
NS, and UID of 0 inside NS 

a Process has root privileges for operations inside user 
NS 

® Understanding what that means is our goal... 
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Relationships between user namespaces 


® User NSs have a hierarchical relationship: 

9 Parent of a user NS == user NS of process that created 
this user NS 

» Using clone(2), unshare(2) 1 or unshare(l) 

9 Parental relationship determines some rules about how 
capabilities work 
» (End slides) 
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A user namespace hierarchy 
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The first process in a new user NS has root privileges 


a When a new user NS is created (unshare(l), clone(2), 
unshare(2)), first process in NS has all capabilities 

a That process has power of superuser! 
a ... but only inside the user NS 
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UID and GID mappings 


® One of first steps after creating a user NS is to define 

UID and GID mappings for NS 

9 Defined by writing to 2 files: /proc/PID/uid_map and 
/proc/PID/gid_map 

9 For security reasons, there are many rules + restrictions on: 
o How/when files may be updated 
o Who can update the files 
» Way too many details to cover here... 

® See user_namespaces(7) 
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UID and GID mappings 


a Records written to/read from uid_map and gid_map have 
the form: 

ID-inside-ns ID - outside-ns length 


o ID-inside-ns and length define range of IDs inside user NS 
that are to be mapped 

» ID-outside-ns defines start of corresponding mapped range 
in “outside” user NS 

a Commonly these files are initialized with a single line 
containing “root mapping”: 

0 1000 1 


a One ID, 0, inside NS maps to ID 1000 in outer NS 
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Example: creating a user NS with “root” mappings 


a unshare -U -r creates user NS with root mappings 


9 Create a user NS with root mappings running new shell, and 
examine map files: 


$ id 

# Show credentials in current shell 

uid = 1000(mtk) 

gid=1000(mtk) 


$ PS1=’uns2$ ’ 

unshare -U -r 

bash 

uns2$ cat /proc/$$/uid_map 


0 

1000 

1 

uns2$ cat /proc/$$/gid_map 


0 

1000 

1 
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Example: creating a user NS with “root” mappings 


a Examine credentials and capabilities of new shell: 


uns2$ 

id 



uid = 0 ( 

root) 

gid=0(root) groups 

=0(root) ... 

uns2$ 

egrep 

’ [UG]id|CapEff ’ /p 

roc/$$/status 

Uid : 

0 0 0 

0 


Gid : 

0 0 0 

0 


CapEf f 

: 0000003fffffffff 

# Hex bit mask 


a 0x3fffffffff is bit mask with all 38 capability bits set 

a getpcaps from libcap project gives same info more readably 
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Example: creating a user NS with “root” mappings 


a Discover PID of shell in new user NS: 


uns2$ echo $$ 
21135 


9 From a shell in initial user NS, examine credentials of that 
PID: 


$ grep ’[UG]id’ /proc/21135/status 
Uid : 1000 1000 1000 1000 

Gid: 1000 1000 1000 1000 
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I’m superuser! (But, you’re a big fish in a little pond) 


a From the shell in new user NS, let's try to change the 
hostname 

9 Requires CAP_SYS_ADMIN 

uns2$ hostname bizarro 

hostname: you must be root to change the host name 


a Shell is UID 0 (superuser) and has CAP_SYS_ADMIN 
a What went wrong? 

a The new shell is in new user NS, but still resides in initial 

UTS NS 

a (Remember: hostname is isolated/governed by UTS NS) 
a Let's look at this more closely... 
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User namespaces and capabilities 


® Kernel grants all capabilities to initial process in new user 
NS of capabilities 

9 But, those capabilities are available only for operations on 
objects governed by the new user NS 

» But what does that mean? 
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User namespaces and capabilities 


a We’ve already seen that: 

9 There are a number of NS types 
» Each NS type governs some global resource(s); e.g.: 

9 UTS: hostname, NIS domain name 
9 Mount: set of mount points 

9 Network: IP routing tables, port numbers, /proc/net, ... 

a Adding to this: each nonuser NS instance is owned by 
some user NS instance 

9 When creating new nonuser NS, kernel marks that NS as 
owned by user NS of process creating the new NS 

a If a process operates on resources governed by nonuser NS: 
9 Permission checks are done according to that process’s 
capabilities in user NS that owns the nonuser NS 
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User namespaces and capabilities 


a To illustrate, let's look at set-up resulting from command: 

unshare -Ur -u <prog> 


(Create process running prog in new user NS 
with root mappings + new UTS NS) 
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User namespaces and capabilities-an example 



a Example scenario; X was created with: unshare -Ur -u <prog> 
a X is in new user NS, with root mappings, and has all capabilities 
a X is in a new UTS NS, which is owned by new user NS 
a X is in initial instance of all other NS types (e.g., network NS) 
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User namespaces and capabilities-an example 



® Suppose X tries to change hostname (cap_sys_admin) 

® X is in second UTS NS 

a Permissions checked according to X’s capabilities in user NS that owns 
that UTS NS => succeeds (X has capabilities in that user NS) 
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User namespaces and capabilities-an example 



® Suppose X tries to bind to reserved socket port (cap_net_bind_service) 
® X is in initial network NS 

a Permissions checked according to X’s capabilities in user NS that owns 
network NS => attempt fails (no capabilities in initial user NS) 
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Discovering namespace relationships 


a There are APIs to discover parental relationships between 
user NSs and ownership relationships between user NSs and 
nonuser NSs 

a See ioctl_ns(2), 

http://blog, man7.org/2016/12/introspecting-namespace-relationships.html 
a Code example: namespaces/namespaces_of .go 

a Shows namespace memberships of specified processes, in 
context of user NS hierarchy 
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Discovering namespace relationships 


a Commands to replicate scenario shown in previous slides: 

$ echo $$ # PID of a shell in initial user NS 

327 

$ unshare -Ur -u sh # Create new user and UTS NSs 
# echo $$ # PID of shell in new NSs 

353 


® Inspect with namespaces/namespaces_of . go program: 

$ go run namespaces_of.go --namespaces=net,uts 327 353 
user {3 4026531837} <UID: 0> 

[ 327 ] 

net {3 4026532008} 

[ 327 353 ] 
ut s {3 4026531838} 

[ 327 ] 

user {3 4026532760} <UID: 1000> 

[ 353 ] 

ut s {3 4026532761} 

[ 353 ] 


» Shells are in same network NS, but different UTS+user NSs 
a Second UTS NS is owned by second user NS 
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User namespaces permit novel applications 


a User NSs permit novel applications; for example: 

» Running Linux containers without root privileges 
» Docker, LXC 

9 Chrome-style sandboxes without set-UID-root helpers 

9 Set-UID-root helpers are (were) used to set up sandbox 

® https://chromium.googlesource.com/ 

chromium/src/+/master/docs/design/sandbox.md 

9 User namespace with single UID identity mapping =>• no 
superuser possible! 

9 E.g., uid_map: 1000 1000 1 
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User namespaces permit novel applications 


a User NSs permit novel applications; more examples: 

» chroot ()~ based applications for process isolation 

9 User NSs allow unprivileged process to create new mount 
NSs and use chroot() 

9 fakeroot-type applications without LD_PRELOAD/dynamic 
linking tricks 

9 ( http://fakeroot.alioth.debian.org/) 
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User namespaces permit novel applications 


a User NSs permit novel applications; more examples: 

» Firejail: namespaces + seccomp + capabilities for 
generalized, simplified sandboxing of any application 
» https://firejail.wordpress.com/, 
https: //Iwn. net/Articles/671534/ 

9 Flatpak: namespaces + seccomp + capabilities + cgroups 
for application packaging / sandboxing 

9 Allows upstream project to provide packaged app with all 
necessary runtime dependencies 

- No need to rely on packaging in downstream distributions 

- Package once; run on any distribution 

9 Desktop applications run seamlessly in GUI 
9 http://flatpak.org/, https://lwn.net/Articles/694291/ 
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Namespaces: sources of further information 


a My LWN.net article series Namespaces in operation 
o https://lwn.net/Articles/531114/ 
a Many example programs and shell sessions... 
a Man pages: 

a namespaces(7), user_namespaces(7), mount_namespaces(7), 
pid_namespaces(7), etc. 

a unshare( 1), nsenter(1) 
a capabilities(7) 

a clone(2), unshare(2), setns(2), ioctl_ns(2) 

a “Linux containers in 500 lines of code” 

a https://blog.lizzie.io/linux-containers-in-500-loc.html 
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Thanks! 


Michael Kerrisk, Trainer and Consultant 
http://man7.org/training/ 

mtk@man7.org ©mkerrisk 

Slides at http://man7.org/conf/ 
Source code at http://man7.org/tlpi/code/ 
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Combining user namespaces and other namespace types 


® Earlier, we noted that CAP_SYS_ADMIN is needed to create 
nonuser NSs 

9 So, why can unprivileged user do this: 

$ unshare -U -u -r bash 

9 Can do this, because kernel first creates user NS, giving child 
all privileges, so that UTS NS can also be created 


9 Equivalent to following, but without intervening child 
process: 


$ 

unshare 

-U -r bash 

# 

Child in new user 

NS 

$ 

unshare 

-u bash 

# 

Grandchild in new 

UTS NS 
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What about resources not governed by namespaces? 


a Some privileged operations relate to resources/features not 
(yet) governed by any namespace 
» E.g., system time, kernel modules 

a Having all capabilities in a (noninitial) user NS doesn’t grant 
power to perform operations on features not currently 
governed by any NS 

» E.g., can't change system time or load/unload kernel 
modules 
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But what about accessing files (and other resources)? 


a Suppose UID 1000 is mapped to UID 0 inside a user NS 

a What happens when process with UID 0 inside user NS tries 
to access file owned by (“true”) UID 0? 

a When accessing files, IDs are mapped back to values in 
initial user NS 

o There is a chain of user NSs starting at NS of process and 
going back to initial NS 

a Examining the mappings in this chain allows kernel to know 
“true” UID and GID of processes in user NSs 

a Same principle for checks on other resources that have 
UID+GID owner 

a E.g., Various IPC objects 
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What are the rules that determine 
the capabilities that a process 
has in a given user namespace? 
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User namespace hierarchies 


a User NSs exist in a hierarchy 

a Each user NS has a parent, going back to initial user NS 
a Parental relationship is established when user NS is created: 
o Parent of a new user NS is user NS of process that created 
new user NS 

a Parental relationship is significant because it plays a part in 
determining capabilities a process has in user NS 
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User namespaces and capabilities 


a Whether a process has a capability inside a user NS depends 
on several factors: 

9 Whether the capability is present in the process's (effective) 
capability set 

o Which user NS the process is a member of 

o The (effective) process's UID 

9 The (effective) UID of the process that created the user NS 
» At creation time, kernel records eLIID of creator as 
"owner UID” of user NS 

9 The parental relationship between user NSs 

9 (namespaces/ns_capable. c program encapsulates the 
rules shown on next slide-it answers the question, does 
process P have capabilities in namespace X?) 
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Capability rules for user namespaces 


Q A process has a capability in a user NS if: 
a it is a member of the user NS, and 
a capability is present in its effective set 
a Note: this rule doesn't grant that capability in parent NS 

Q A process that has a capability in a user NS has the 
capability in all descendant user NSs as well 

» l.e., members of user NS are not isolated from effects of 
privileged process in parent/ancestor user NS 

O (All) processes in parent user NS that have same eUlD as 
elllD of creator of user NS have all capabilities in the NS 
a At creation time, kernel records el)ID of creator as 
“owner UID" of user NS 

a By virtue of previous rule, capabilities also propagate into 
all descendant user NSs 
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Outline 
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User namespaces are hard (even for kernel developers) 


a Developer(s) of user NSs put much effort into ensuring 
capabilities couldn’t leak from inner user NS to outside NS 
a Potential risk: some piece of kernel code might not be 
refactored to account for distinct user NSs 

a => unprivileged user who gains all capabilities in child NS 
might be able to do some privileged operation in outer NS 

a User NS implementation touched a lot of kernel code 

a Perhaps there were/are some unexpected corner case that 
wasn't correctly handled? 

a A number of such cases have occurred (and been fixed) 

a Common cause: many kernel code paths that could formerly 
be exercised only by root can now be exercised by any user 
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